A Novel Gene Prognostic Signature Based on Differential DNA Methylation in Breast Cancer

Background: DNA methylation played essential roles in regulating gene expression. The impact of DNA methylation status on the occurrence and development of cancers has been well demonstrated. However, little is known about its prognostic role in breast cancer (BC). Materials: The Illumina Human Methylation450 array (450k array) data of BC was downloaded from the UCSC xena database. Transcriptomic data of BC was downloaded from the Cancer Genome Atlas (TCGA) database. Firstly, we used univariate and multivariate Cox regression analysis to screen out independent prognostic CpGs, and then we identified methylation-associated prognosis subgroups by consensus clustering. Next, a methylation prognostic model was developed using multivariate Cox analysis and was validated with the Illumina Human Methylation27 array (27k array) dataset of BC. We then screened out differentially expressed genes (DEGs) between methylation high-risk and low-risk groups and constructed a methylation-based gene prognostic signature. Further, we validated the gene signature with three subgroups of the TCGA-BRCA dataset and an external dataset GSE146558 from the Gene Expression Omnibus (GEO) database. Results: We established a methylation prognostic signature and a methylation-based gene prognostic signature, and there was a close positive correlation between them. The gene prognostic signature involved six genes: IRF2, KCNJ11, ZDHHC9, LRP11, PCMT1, and TMEM70. We verified their expression in mRNA and protein levels in BC. Both methylation and methylation-based gene prognostic signatures showed good prognostic stratification ability. The AUC values of 3-years, 5-years overall survival (OS) were 0.737, 0.744 in the methylation signature and 0.725, 0.715 in the gene signature, respectively. In the validation groups, high-risk patients were confirmed to have poorer OS. The AUC values of 3 years were 0.757, 0.735, 0.733 in the three subgroups of TCGA dataset and 0.635 in GSE146558 dataset. Conclusion: This study revealed the DNA methylation landscape and established promising methylation and methylation-based gene prognostic signatures that could serve as potential prognostic biomarkers and therapeutic targets.


INTRODUCTION
Breast cancer (BC) is the most prevalent cancer and the leading cause of cancer mortality in women worldwide (Freddie et al., 2020). In the United States, it is estimated that around 13% of women will suffer from BC in their lifetime (DeSantis et al., 2019). In recent years, the mortality of BC has gradually declined and the 5-years OS rate has reaches 90% attributed to early detection and improved treatment (Allemani et al., 2018).
Breast cancers are categorized into ER+, ER+/HER2-, HER2+ and Triple-negative subtypes based on the expression of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2). Similarly, gene expression analysis of these receptors further recognizes four subsets of BC: luminal A, luminal B, HER2-enriched (HER2-E) and Basal-like (Parker et al., 2009). These classification systems not only help predict the prognosis of BC patients, but also guide treatment choices. Conventional therapies such as surgery, radiotherapy, and chemotherapy form the basis of BC treatment. In addition, endocrine therapy for hormone receptor-positive BC and anti-HER2 treatment for HER2 expressing BC have greatly improved the prognosis of patients. Unfortunately, triple negative breast cancer (TNBC) still lacks effective therapeutic targets. Recent studies demonstrated that poly ADP-ribose polymerase 1 (PARP1) inhibitors and immune checkpoint inhibitors (ICIs) showed potential effect in TNBC (Mittendorf et al., 2020), (Schmid et al., 2020). Despite the great achievements in treatment, about 25-40% of BC patients will develop metastases (Siegel et al., 20172017). Among them, bone metastases are the most common, and approximately 75% of late-stage BC patients are diagnosed with bone metastases (Tulotta and Ottewell, 2018). Moreover, 5-20% of BC patients would have brain metastases (Achrol et al., 2019). Once the patient develops metastasis, the prognosis is poor, with the median OS of only 1-2 years (Redig and McAllister, 2013), (Martin et al., 2017). Therefore, it is urgent to find potential prognosis-related biomarkers to accurately predict the prognosis of BC patients.
Epigenetics events such as DNA methylation, histone modifications, chromatin remodeling, and non-coding RNAs play essential roles in the regulation of gene expression and actively participate in the development and progression of cancers. DNA methylation, which affects gene expression without changing the DNA sequence, is the most common epigenetic modification (Nakao, 2001), (Strahl and Allis, 2000). Stefansson et al. demonstrated that abnormal methylation of CpG islands in the promoter regions might activate proto-oncogenes or silence tumor suppressor genes, thereby contributing to the occurrence and development of tumors (Stefansson and Esteller, 2013). Accumulating evidences showed that decreased levels of genome-wide methylation were a critical sign of early cancers and were related to cancer grade and metastasis (Yang and Schwartz, 2011), (Ding et al., 2019). Indeed, DNA methylation was associated with most malignancies including bladder cancer (Chen et al., 2020a), lung cancer (Liang et al., 2019), and gastrointestinal tumors (Woo et al., 2018), (Huang et al., 2018).
Emerging studies have revealed the important roles of DNA methylation in BC (Pasculli et al., 2018;Kresovich et al., 2019;Xu et al., 2020). For instance, distinct DNA methylation patterns and associated gene expression profiles were found in different molecular subtypes of BC. SFRP1, a tumor suppressor gene, was down-regulated by hypermethylation in ER + breast cancer, leading poor prognosis (Stefansson et al., 2015), (Park et al., 2012). Other genes such as BRCA1, CDH1, and PTEN, were also abnormally methylated in BC. These events could serve as potential therapeutic and prognostic biomarkers (FitzGerald et al., 1998;Pharoah et al., 2001;King et al., 2003;Walsh et al., 2006;Suijkerbuijk et al., 2008;Luo et al., 2016). However, the prognostic role of DNA methylation in BC remains incompletely demonstrated.
In this study, we used bioinformatics methods to determine the prognostic role of DNA methylation and constructed methylation-associated prognostic signatures for BC. This study will help unveil the significance of DNA methylation in BC and might help discover novel prognostic biomarkers.

Data Acquisition and Processing
RNA-seq data in fragments per kilobase of transcript per million mapped reads (FPKM) form and clinical information of BC were downloaded from the Cancer Genome Atlas (TCGA: https:// portal.gdc.cancer.gov/) database. Illumina Human Methylation450 BeadChip array (450k array) and Illumina Human Methylation27 BeadChip array (27k array) data of TCGA database were downloaded from UCSC xena (https:// xenabrowser.net/) (Goldman et al., 2020). DNA Methylation levels were evaluated by the β value, which ranged from 0 to 1 (0 means unmethylated and 1 means fully methylated). Probes with over 70% of missing values and probes located at chromosomes X and Y were removed. The missing values of the remaining probes were imputed using the k-nearest neighbours (knn) imputation algorithm of the impute R package. Since DNA methylation in promoter regions would strongly influence gene expression, we focused on the methylation probes in promoter regions defined as 2.0 kb upstream to 0.5 kb downstream from transcription start sites (TSS). Batch effects were removed by the ComBat algorithm of the sva R package (Leek et al., 2012). Ultimately, 560 patients including methylation data (from 450k array) and corresponding clinical data, 986 patients (from TCGA database) containing both gene expression data and corresponding clinical data were used for mainly analysis. And we obtained 557 overlapping patients (from the above two datasets) with complete gene expression data, methylation data and clinical data. Moreover, RNA-seq data and clinical information of 106 samples from GSE146558 were downloaded from NCBI (GEO: https://www.ncbi.nlm.nih.gov/) as external validation dataset. The mRNA expression profile from GEO dataset was normalized by the Robust Multichip Average (RMA) algorithm with background adjustment, quantile normalization, and final summarization. The workflow of our study was illustrated in Figure 1.

Independent Prognostic CpG Loci Screening
Univariate Cox regression analysis was performed to screen out the prognosis-related CpGs of the 560 BC patients. In our study, we used the OS as clinical parameter of prognosis. Next, all these CpGs were subjected to multivariate Cox regression analysis, with age, pathological stage, and TNM stage as covariates to identify independent prognostic CpGs.

GO and KEGG Analysis
The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/) (Kanehisa et al., 2017) analysis were performed using the R ClusterProfiler package (Yu et al., 2012). A p value < 0.05 was set as the cut-off value for both GO and KEGG analyses in our study.

Consensus Clustering and Evaluation of CpG-Related Subtypes
Consensus clustering (Wilkerson and Hayes, 2010) was performed to determine subgroups of different methylation characteristics of the 560 BC patients based on the independent prognostic CpG loci using the ConsensusClusterPlus package of R. The criteria to determine the number of clusters were: (Freddie et al., 2020) The consistency within the cluster was relatively high; (DeSantis et al., 2019) There was no significant increase in the area under the CDF curve; (Allemani et al., 2018) The relative change in area under CDF curve tended to be stable. We then generated a consensus matrix to better visualize and help determine the number of clusters.

Construction and Validation of the Methylation Prognostic Model
Differentially methylated independent prognostic CpG loci between the different prognosis clusters were screened out using Wilcoxon test. The filtering conditions were false discovery rate (FDR) < 0.05 and | log2 (fold change) | >0.585. On the basis of these differentially methylated CpG loci, a methylation prognostic model of 560 BC patients was constructed using multivariate Cox analysis. The formula of the risk score was as follows: Risk score coef i β i where coef i was the multivariate Cox regression coefficient, and βi was the corresponding methylation β value. According to the median risk score (Chen et al., 2020b), (Shen et al., 2019), patients were divided into methylation low-risk (n 280) and high-risk (n 280) groups. Survival curves were employed to compare the OS of the two groups. Risk curve was plotted to visualize the relationship of the risk score, survival status and the methylated levels of the six signature CpG loci. Univariate and multivariate Cox regression analysis were performed to explore whether the risk score could be an independent predictor of OS. The sensitivity and specificity of the methylation prognostic model were evaluated by calculating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve.

Construction and Evaluation of Gene Prognostic Model
Differentially expressed genes (DEGs) were identified from methylation high-risk and low-risk groups ( Patients were also categorized into low-risk (n 493) and high-risk (n 493) groups based on the median risk score. Kaplan-Meier survival curve, risk curve, ROC curve, univariate and multivariate Cox regression analysis were also used for evaluating and validating the prognostic signature. Besides, two subgroups from 986 BC patients, the 557 patients' dataset and GSE146558 dataset were used to validate the prognostic value of the gene signature.

Antitumor Drug Sensitivity Analysis
CellMiner (https://discover.nci.nih.gov/cellminer/home.do) is a robust, user-friendly online database that integrates drug sensitivity and genomic data (Reinhold et al., 2017), (Wang et al., 2016). Anti-tumor activity data obtained from NCI-60 tumor cell line panel of the developmental therapeutics program (DTP) and RNA-seq data for the 60 cell lines of the NCI DTP drug screen were downloaded from this website. Subsequently, correlation between the sensitivity of anti-tumor drugs and the signature genes was analyzed.

Statistical Analysis
R 3.6.3 (version 3.6.3, https://pan.baidu.com/s/ 1sufVf2lmoj9GYG_j5_fJKQ) was used for statistical analysis and plotting. Consensus clustering was performed using the ConsensusClusterPlus package of R; COX regression analysis was performed with the coxph function in survival package of R (Zhang, 2016); Kaplan-Meier curve was plotted using the survival and survminer packages of R; Pheatmap was plotted using the pheatmap package of R; The forest plots were plotted by the forestplot package of R; ROC curve was plotted by the survival ROC package of R. GO and KEGG analyses were performed using the ClusterProfiler package of R.
Mann-Whitney test was used to estimate the statistical significance of two groups of skewed distributed continuous variables, and Kruskal-Wallis test was used to evaluate the statistical significance of multiple groups of skewed distributed continuous variables (with Bonferroni correction for pairwise comparisons among multiple groups). All tests were two-sided and for all statistical tests, p < 0.05 was considered to be statistically significant unless otherwise specified.

Screening of Prognosis-Related CpG Loci in Breast Cancer
In our study, 450k array dataset was defined as train group and 27k array dataset was defined as test group (Table 1). Firstly, 144 CpG loci with p < 0.001 by univariate Cox analysis were screened out and identified as prognosis-related CpG loci. Using age, pathological staging and TNM staging as covariates, 66 CpG loci (49 CpG loci associated with favorable prognosis, 17 CpG loci associated with poor prognosis) with p < 0.001 by multivariate Cox analysis were further selected and used as the methylation classification features (Figure 2).

Identification of DNA Methylation-Based Prognosis Subgroups
Then, the 560 patients were categorized into clusters of different methylation characteristics with consensus clustering based on the methylation of the 66 independent prognostic CpG loci. When the patients were assigned to 5 categories, the consistency within the clusters was high, the area under the cumulative distribution function (CDF) curve began to stabilize, and the relative change in area under CDF curve tended to be stable ( Figures 3A,B). A consensus matrix representing the consensus for k 5 also displayed a welldefined 5-block structure ( Figure 3C). Accordingly, the optimal number of clusters was determined to be 5.
Subsequently, we conducted a subgroup analysis for the 5 clusters. Firstly, we compared the methylation levels of these 66 independent prognostic methylation loci among the five clusters. As illustrated in Figures 3D,E, Cluster 5 had the lowest methylation levels, followed by Clusters 2 and 4, Clusters 1 and 3. And the methylated difference between Cluster 5 and each of the remaining clusters was statistically significant. To explore the prognostic significance of the five clusters, Kaplan-Meier survival analysis was performed. We found that the prognosis was statistically significantly different among the 5 clusters, where Cluster 1 and Cluster 3 had the best OS, while Cluster 5 had the worst ( Figure 3F).

Construction and Evaluation of Methylation Prognosis Model
The five clusters were significantly prognosis-associated, and therefore were used to identify potential prognostic biomarkers. Given that Cluster 5 had the lowest methylation level and the worst OS, it was reasonable to be selected as the reference cluster. Next, 20 differentially methylated independent prognostic CpG loci were identified from Cluster 5 and the rest clusters (Table 2). Ultimately, a methylation prognostic model was constructed which included six CpG loci (cg00945507, cg05406101, cg10092957, cg13060154, cg14992108, cg18678121) determined by multivariate Cox analysis ( Table 3). Kaplan-Meier analysis showed that cg00945507, cg05406101, cg10092957, cg14992108, cg18678121 were associated with improved survival, and cg13060154 was associated with poor survival (Figure 4).
Then we explored the mechanisms by which these signature CpG loci might act on BC. The six CpG loci, cg00945507, cg05406101, cg10092957, cg14992108, cg18678121, and cg13060154, were located at gene promoter regions of SEC61G, RWDD2B, NCCRP1, SNTB1, SEC61A2, DAB2IP, respectively. We firstly analyzed the correlation of these CpG loci and their corresponding target genes. The methylation of cg10092957, cg05406101, cg18678121, cg00945507 were moderately negatively correlated with the expression of their target genes. Whereas cg13060154 was weakly positively correlated with its corresponding gene (Supplementary Figures S1A-F). Consistent with the above results, the increased expression of NCCRP1, RWDD2B, SEC61A2, and SEC61G was associated with the decreased β values of cg10092957, cg05406101, cg18678121, and cg00945507, respectively. To the opposite of them, DAB2IP had higher expression in the presence of hypermethylated cg13060154 (Supplementary Figures S1G-L). However, there was no relationship between the cg14992108 and its target gene SNTB1. In addition, we took advantage of TCGA Wanderer, an interactive viewer exploring DNA methylation and gene expression data in human cancer (http://maplab.imppc.org/ wanderer/) (Díez-Villanueva et al., 2015), to explore the methylated difference of the six CpG loci between breast cancer and normal tissues. We found that the methylation levels of cg05406101, cg18678121, cg14992108 were higher in normal tissues. However, the methylation levels of cg13060154 and   (Table 3).
According to the formula, we computed the risk score of each BC patient in the train group (n 560) and assigned them into high-risk (n 280) and low-risk groups (n 280) with reference to the median risk score. The methylation levels of the six CpG loci between the high-risk group and low-risk group were showed in Supplementary Figures S2A. Kaplan-Meier curve indicated that the high-risk group had significantly poorer OS than the low-risk group ( Figure 5A). To confirm the methylation risk score could effectively predict the BC patients' prognosis, we plotted the ROC curve. Notably, we observed that the risk score had the highest prediction performance of prognosis compared with the conventional clinical features, with the 3-years and 5-years AUC values being 0.739 and 0.744 ( Figures 5B,C). The relationship of methylation risk score, survival status and methylation levels of the six signature CpG loci was shown in Figures 5D-F. Univariate Cox analysis indicated that age, stage, T stage, N stage and risk score were significantly associated with OS. However, when they were introduced into multivariate Cox analysis, only age [hazard ratio, 1.026 (95% CI, 1.008-1.045), p 0.005], and risk score [hazard ratio, 2.823 (95% CI, 2.131-3.741), p < 0.001] remained as independent prognostic predictors ( Figures 5G,H). Similar prognostic significance was observed in the validation cohort (27k array dataset), with AUC values of 0.603 and 0.657 for 3 and 5 years, respectively (Supplementary Figures S2B-D).

Identification of Differentially Expressed Genes From the Methylation High-Risk and Low-Risk Groups
Using the thresholds of adjusted p < 0.05 and | log2 (fold change) | >1, a total of 413 differentially expressed genes (DEGs) between methylation high-risk and low-risk groups were obtained. The volcano and heatmap visually displayed the DEGs (Figures 6A,B). To further investigate the biological characteristics of the DEGs, function and pathway annotations were performed. GO analysis indicated that these genes were involved in the regulation of cell cycle processes and mitotic cell

Construction and Evaluation of Gene Prognostic Signature
To explore the prognostic values of the identified 413 DEGs, we extracted the expression of these DEGs from the transcriptomic profiles of 986 BC patients in the TCGA-BRCA database for subsequent analysis. Firstly, 50 DEGs significantly related with OS were selected by Kaplan-Meier analysis (p < 0.05). All of these prognostic genes were subjected to univariate Cox regression analysis and 13 of them with p < 0.05 were further identified as prognosis-associated genes ( Supplementary Table S1). Ultimately, six prognosis-associated genes, namely IRF2, KCNJ11, ZDHHC9, LRP11, PCMT1, and TMEM70, were included in developing a gene prognostic signature by multivariate Cox regression analysis. Among them, four genes (ZDHHC9, LRP11, PCMT1, TMEM70) were related with poor survival and two genes (IRF2, KCNJ11) were associated with good survival (Supplementary Figures S3). The risk formula was as follows: Gene expression analysis of the six signature genes in the Oncomine database (https://www.oncomine.org) revealed that ZDHHC9, LRP11, PCMT1, TMEM70 were highly expressed in breast cancer, and IRF2, KCNJ11 were highly expressed in normal tissues ( Figure 7A). Then we explored the protein levels of these six genes between breast cancer and normal tissues in the Human Protein Atlas database (Uhlen et al., 2010) (HPA: https://www. proteinatlas.org/humanproteome/pathology). In accordance with the gene expression levels, the protein levels of ZDHHC9, PCMT1, TMEM70 were significantly higher in breast cancer, and the protein levels of IRF2, KCNJ11 were higher in normal tissues ( Figure 7B). Moreover, we further checked the prognostic values of our six genes in the public database TCGA portal (version 1.0) (http://tcgaportal.org/ TCGA/Breast_TCGA_BRCA/process.php), and we found  that ZDHHC9, LRP11, PCMT1, TMEM70 were associated with poor prognosis, while IRF2 and KCNJ11 were related with good prognosis of BC patients ( Figure 7C).

Association of Methylation Prognostic Model and Methylation-Based Gene Prognostic Model
Perhaps not surprisingly, a positive and significant correlation was observed between the two prognostic signatures, which was mainly reflected at the following aspects: firstly, a moderate correlation was found between the six signature CpG loci and six signature genes. Specifically, the expression of IFR2 was positively related with the methylation of cg00945507, cg05406101, cg14992108, and cg18678121, above of which were all good prognostic factors, while IFR2 expression was negatively related with the methylation of poor prognostic CpG locus: cg13060154; And KCNJ11 expression was positively correlated with the methylation of favorable prognostic CpG loci: cg05406101 and cg10092957, and negatively related with the methylation of cg13060154; To the opposite of these two genes, ZDHHC9 and TMEM70 expression Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 742578 8 were negatively related with the methylation of cg05406101 and cg18678121, and positively associated with the methylation of cg13060154; Similarly, the expression of PCMT1 was negatively correlated with the methylation of cg05406101, cg14992108, cg18678121; And LRP11 expression was negatively related with the methylation of cg00945507, cg05406101, cg18678121 (Supplementary Figures S4A, Supplementary Figures S4I).

Evaluation and Validation of the Gene Prognostic Signature
The expression of the six signature genes between the gene high-risk and low-risk groups was shown in Supplementary Figures S5A. Kaplan-Meier analysis showed that survival probability in the lowrisk group was higher ( Figure 8A). The AUC values of 3-years and 5-years OS were 0.725 and 0.715 ( Figures 8B,C). The risk score distribution, the survival status, and the expression of the six genes of 986 BC patients were visualized in Figures 8D-F. Univariate and multivariate Cox regression analyses indicated that the risk score was associated with OS and could be an independent prognostic predictor, with univariate hazard ratio, 1.187 (95% CI, 1.082-1.302, p < 0.001), multivariate hazard ratio, 1.199 (95% CI, 1.094-1.314), p < 0.001, respectively ( Figures 8G,H).
To confirm the prognostic value of the six-gene signature, we tested it with the validation subgroup comprising of 557 BC patients. The results were consistent with our previous findings. Univariate and multivariate COX regression analysis also showed that the risk score was an independent prognostic predictor of BC (Supplementary Figures S5D,E). Subsequently, the 986 BC patients' dataset was randomly assigned into two test subgroups [test group one (n 494) and test group two (n 492)] which were with balanced baseline characteristics (Table 4), and both of them were used for validating the gene signature. The two subgroups could also distinguish the favorable OS patients from the poor OS patients (Supplementary Figures S5F,G). AUC values of 1 year, 3 years, 5 years were 0.711, 0.757, 0.721 in test group one, and 0.864, 0.733, 0.702 in test group two, respectively (Supplementary Figures S5I,J). Moreover, the external dataset GSE146558 further confirmed our gene prognostic signature. In high-risk group of GSE146558 dataset, patients were with poorer OS (Supplementary Figures S5H). And AUC value of 3 years was 0.634 (Supplementary Figures S5K).
In addition, we confirmed the prognostic value of the six-gene prognostic signature in the subgroups of BC patients presented  Figure S6).
Considering the important roles of BRCA1, BRCA2, CDH1, PTEN, TP53, PIK3CA in BC, we also evaluated these gene expression between gene high-risk and low-risk groups, and observed that the expression of oncogenes such as BRCA1, BRCA2 and CDH1 were significantly higher in high-risk group. On the other hand, the expression of the tumor suppressor gene PTEN was significantly higher in low-risk group (Figure 9).

Gene Set Enrichment Analysis
To investigate potential functions and signaling pathways related to the six-prognosis signature, we performed Gene Set Enrichment Analysis (GSEA: http://www.gsea-msigdb.org/gsea/ index.jsp). Notably, we found that more tumor-related GO terms and KEGG pathways were associated with low-risk group ( Figure 10A). In detail, low-risk group was mainly associated with the function of regulating epithelial and endothelial cell migration, and high-risk group was related with nuclear chromosome condensing, protein folding. Pathway enrichment analysis indicated that JAK/STAT signaling pathway, cell adhesion molecule signaling pathway, VEGF signaling pathway, and MAPK signaling pathway were active in the low-risk group. On the other hand, P53 signaling pathway was active in the high-risk group ( Figure 10B).

Correlation Analysis of the Six Signature Genes and the Sensitivity of Anti-Tumor Drugs
Correlation analysis between the expression of the six prognosis genes and the sensitivity of anti-tumor drugs was performed based on the CellMiner database (https://discover.nci.nih.gov/ cellminer/), and the results indicated that our signature genes were moderately correlated with the response of some common anti-tumor drugs such as PARP inhibitor (Olaparlib), chemotherapy drugs (Fluorouracil, Decitabine, Oxaliplatin), which might imply potential value in anti-tumor therapy ( Figure 11).

DISCUSSION
With the advent of next-generation sequencing, genome-wide DNA methylation profile analysis has become possible. Multiple studies have suggested that DNA methylation plays an important   (Győrffy et al., 2016). Potential targets for immunotherapy are still being explored. Recent studies have shown that immune cell infiltration might be a biomarker for immunotherapy. Importantly, the methylation of immune genes could also highly sensitively reflect the presence of tumor infiltrating lymphocytes. Thus, DNA methylation profiles could  be used to predict the proportion of all kinds of immune cells in the tumor microenvironment (Győrffy et al., 2016), (Jeschke et al., 2015). Given the important role of DNA methylation, it is not surprising that a better understanding of the DNA methylation and the exploration of the interaction mechanism between genes and methylation are crucial for BC patients. DNA methylation has a substantial impact on gene expression, and affects the prognosis of different subtypes of BC patients (44). In this study, we obtained six prognosis-related CpG loci, cg00945507, cg05406101, cg10092957, cg14992108, cg18678121, cg13060154, respectively targeting SEC61G, RWDD2B, NCCRP1, SNTB1, SEC61A2, DAB2IP genes. SEC61G was found to be overexpressed in BC and might coamplify with epidermal growth factor receptor (EGFR) (Reis-Filho et al., 2006). Lu et al. reported that the expression of SEC61G in BC was negatively correlated with its promoter methylation (Lu et al., 2021). In our research, similar trend could be found that the expression of SEC61G was negatively related with the methylation of cg00945507. Moreover, the methylation level of SEC61G was positively correlated with the prognosis of patients with glioma (Liu et al., 2019a). Miwa et al. proved that NCCRP1 transcription was inhibited by promoter hypermethylation in esophageal squamous cell carcinoma (Miwa et al., 2017). And high expression of NCCRP1 in patients with pancreatic cancer was associated with a poor prognosis (Zuo et al., 2020). In our study, we observed that the expression of NCCRP1 was negatively correlated with the methylation of cg10092957. DAB2IP was a candidate tumor suppressor gene and its expression down-regulation mechanism was mainly through the promoter hypermethylation (Qiu et al., 2007). Demethylation of DAB2IP gene weakened the EMT process and suppressed hepatocellular carcinoma growth (Liu et al., 2019b). However, we observed a weak positive correlation between DAB2IP expression and cg13060154 methylation in our study. Regrettably, studies on the correlation between the expression of SEC61A2, RWDD2B, SNTB1 and DNA methylation in tumors were insufficient.
On the bias of the six prognostic CpG loci, we developed a methylation risk model that could accurately classify BC patients with different death risk. Subsequently, we identified 413 DEGs from the methylation high-risk and low-risk groups. Function enrichment analysis indicated that these DEGs were related with cell cycle checkpoint, ubiquitin-like protein ligase binding. KEGG pathway analysis showed these genes were mainly enriched in p53 signaling pathway, and TGF-β signaling pathway. The above functions and pathways were common and critical in tumor proliferation, invasion, and metastasis. And then, we further extracted the expression of the 413 DEGs from the transcriptomic profiles of 986 BC patients to evaluate the prognostic roles of these DEGs in TCGA-BRCA dataset. The prognostic value of individual genes or gene signatures has been extensively studied in cancers (Parker et al., 2009). Herein, we got a methylation-based gene prognostic signature using multivariate Cox analysis.
The six signature genes were composed of IRF2, ZDHHC9, KCNJ11, LRP11, PCMT1, and TMEM70. IRF2, a transcription factor in the interferon gamma signal transduction pathway, was FIGURE 9 | The differential expression of breast cancer-associated genes in gene high-risk and low-risk groups.
Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 742578 different expression in breast cancer and normal tissues. Kriegsman et al. found that IRF2, which positively regulated the MHC class I pathway and negatively regulated PD-L1 expression, had good implications for immunotherapy and prognosis of BC (Kriegsman et al., 2019). ZDHHC9, one of risk genes of BC, was found to participate in palmitoylating PD-L1 to keep its protein stability, leading to immune escape. Inhibiting the ZDHHC9 expression made breast cancer cells susceptible to T cell killing and inhibited tumor growth. Thus, ZDHHC9 could be a biomarker of immunotherapy response (Yang et al., 2019). KCNJ11 played a key role in glucosestimulated insulin secretion (Cook and Hales, 1984). It is well established that diabetes is closely related to a variety of tumors (Giovannucci et al., 2010), and the mortality is higher among women with longer diabetes duration in BC (Lega et al., 2018). Therefore, diabetes-related genes KCNJ11 may also be a potential prognostic biomarker of BC. Yet, the relationship between KCNJ11 and breast cancer has not been systematically reported. PCMT1 has gradually been considered as a risk gene in tumors. Study demonstrated that BC patients with higher PCMT1 expression had a poorer prognosis (Dong et al., 2021). Furthermore, the different expression of the six signature genes in mRNA and protein levels was validated in public databases. The methylation-based gene signature could also distinguish BC patients with a significantly increased risk from those with a decreased risk. Moreover, correlation analysis showed that the methylation of the six signature CpGs were closely correlated with the expression of the six signature genes, and the established gene risk score was significantly positively correlated with the methylation risk score. Multigene analysis has been popularized to predict the response of anti-tumor therapy and prognosis in BC. For instance, the EndoPredict score (12-gene molecular signature) has been used to predict the survival without distant recurrence up to 15 years after diagnosis. Recently, the 12-gene MS has also been proven to predict the response to neoadjuvant chemotherapy (NaCT) and neoendocrine therapy (NET) in HR+, her2-BC patients, with AUC values being Frontiers in Genetics | www.frontiersin.org December 2021 | Volume 12 | Article 742578 14 0.736 for NaCT, 0.726 for NET (Dubsky et al., 2020). Other widely used multigene assays involve Oncotype Dx, MammaPrint, and PAM50 which have been validated to predict the treatment response, recurrence, prognosis in BC patients. Among these multi-gene tests, MammaPrint has the best predictive performance (AUC 0.88), following by Oncotype Dx (AUC 0.76), PAM50 risk of relapse based on subtype (ROR-S) (AUC 0.68) and the PAM50 risk of relapse based on subtype and proliferation (ROR-P) (AUC 0.55) (Grimm and Mazurowski, 2020). Our study developed methylation and methylation-based prognostic signatures, both of which had excellent performance in predicting the prognosis of BC patients with 3-years, 5-years AUC values being 0.739, 0.744 for methylation signature, and 0.725, 0.715 for methylation-based gene signature. Three TCGA-BRCA subgroups were used to validate the gene prognostic signature and all of them showed powerful prediction effects with 3-years AUC values of 0.757, 0.735, 0.733, respectively. Moreover, the external dataset GSE146558 was also used to validate our gene prognostic signature. Due to small sample size (n 106) and inter-dataset heterogeneity, we could not obtain a higher AUC value in the validation set, although the 3-years AUC value being 0.634 was still statistically significant.
Increasing researches reported that CDH1, BRCA2, and BRCA1 were susceptibility genes for BC (Petridis et al., 2019) and around 60-70% of women with BRCA1 or BRCA2 gene mutations would be suffered with BC in her lifetime (Antoniou et al., 2003). Besides, BRCA2 mutation carriers were more likely to develop brain metastases than non-carriers . PTEN is a tumor suppressor gene in BC, and researches proved that lack or decrease of PTEN expression might be associated with poor prognosis in BC (Luen et al., 2018). Then we examined the expression alteration of these genes in gene high-risk and low-risk groups to understand the contribution of the gene signature to the carcinogenesis of BC. As expected, the expression of proto-oncogenes BRCA1, BRCA2, and CDH1 was significantly higher in high-risk group. Conversely, expression of tumor suppressor gene PTEN was significantly higher in low-risk group.
There were some advantages of our study. First of all, we were the first one to discuss the prognostic roles of CpG loci in breast cancer, and constructed methylation-associated signatures. Secondly, these two prognostic signatures were positively correlated with each other and both of them could accurately discriminate breast cancer patients with different death risk. Besides, three subgroups of TCGA dataset and an external dataset GSE146558 were verified the prognostic value of our gene signature. Finally, the above results, together with risk gene expression verification, GSEA, drug sensitivity analysis, might provide novel treatment and prognosis biomarkers for breast cancer patients. We believe with the advent of the era of precision medicine, clinical trials could be designed using gene signaturebased risk scores to select the patients most likely to develop poor prognosis in which to develop novel or more intensive postoperative therapies in future.
One major limitation for our study was that data in our study was downloaded from the public databases, the mechanism of the six signature genes and six signature CpGs affecting the occurrence and development of breast cancer still needs to be further verified by vivo and vitro experiments. Even, prospective clinical trials are needed to check the prognostic values of these two signatures.

CONCLUSION
Taken together, we proposed two methylation-related prognostic signatures. These two signatures were significantly positively correlated with each other and both of them could predict the prognosis of BC patients more accurately than traditional clinical predictors. Importantly, the six key genes (IRF2, ZDHHC9, KCNJ11, LRP11, PCMT1, TMEM70) of gene prognostic signature may act as potential prognostic biomarkers and therapeutic targets.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.

ETHICS STATEMENT
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
Conceptualization, CZ, QW, and YZ; methodology, CZ, QW, and YZ; Software, CZ and QW; Validation, CZ, ZZ, SZ, and DL; formal analysis, CZ, and QW; investigation, CZ, QW, and SZ; resources, CZ, QW, and DL; data curation, CZ and QW.; Visualization, CZ, QW, and NY.; Writing-Original Draft Preparation, CZ, QW, and YZ; Writing-Review and Editing, CZ, QW, and YZ; Supervision, CZ, QW, and YZ; Project Administration, CZ, QW and YZ; All authors have read and agreed to the published version of the manuscript. Supplementary Figure S6 | The overall survival analysis of breast cancer patients with different clinical characteristics in gene high-risk and low-risk groups.